RESEARCH ARTICLE 



Highly Recombinant VGII Cryptococcus gattii Population Develops 
Clonal Outbreak Clusters through both Sexual Macro evolution and 
Asexual Microevolution 

R. Blake Billmyre,^ Daniel Croll,'' Wenjun Li,^ Piotr IVlleczkowski,^ Dee A. Carter,'' Christina A. Cuomo,^ James W. Kronstad,*" 
Joseph Heitman^ 

Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina, USA"; The Michael Smith Laboratories, Department of 
Microbiology and Immunology, University of British Columbia, Vancouver, British Columbia, Canada''; Department of Genetics, School of Medicine, University of North 
Carolina, Chapel Hill, North Carolina, USA^; School of Molecular Bioscience, University of Sydney, Sydney, New South Wales, Australia^; Broad Institute of MIT and Harvard, 
Cambridge, Massachusetts, USA= 

ABSTRACT An outbreak of the fungal pathogen Cryptococcus gattii began in the Pacific Northwest (PNW) in the late 1990s. This 
outbreak consists of three clonal subpopulations: VGIIa/major, VGIIb/minor, and VGIIc/novel. Both VGIIa and VGIIc are 
unique to the PNW and exhibit increased virulence. In this study, we sequenced the genomes of isolates from these three groups, 
as well as global isolates, and analyzed a total of 53 isolates. We found that VGIIa/b/c populations show evidence of clonal expan- 
sion in the PNW. Whole-genome sequencing provided evidence that VGIIb originated in Australia, while VGIIa may have origi- 
nated in South America, and these were likely independently introduced. Additionally, the VGIIa outbreak lineage may have 
arisen from a less virulent dade that contained a mutation in the MSH2 ortholog, but this appears to have reverted in the VGIIa 
outbreak strains, suggesting that a transient mutator phenotype may have contributed to adaptation and evolution of virulence 
in the PNW outbreak. PNW outbreak isolates share genomic islands, both between the clonal Uneages and with global isolates, 
indicative of sexual recombination. This suggests that VGII C. gattii has undergone sexual reproduction, either bisexual or uni- 
sexual, in multiple locales contributing to the production of novel, virulent subtypes. We also found that the genomes of two 
basal VGII isolates from HIV"*" patients contain an introgression tract spanning three genes. Introgression substantially contrib- 
uted to intra- VGII polymorphism and likely occurred through sexual reproduction with VGI. More broadly, these findings illus- 
trate how both microevolution and sexual reproduction play central roles in the development of infectious outbreaks from avir- 
ulent or less virulent progenitors. 

IMPORTANCE Cryptococcus gattii is the causative agent responsible for ongoing infections in the Pacific Northwest of the United 
States and western Canada. The incidence of these infections increased dramatically in the 1990s and remains elevated. These 
infections are attributable to three clonal lineages of C. gattii, VGIIa, VGIIb, and VGIIc, with only VGIIa identified once previ- 
ously in the Pacific Northwest prior to the start of the outbreak, albeit in a less virulent form. This study addresses the origin and 
emergence of this outbreak, using whole-genome sequencing and comparison of both outbreak and global isolates. We show 
that VGIIa arose mitoticaUy from a less virulent clonal group, possibly via the action of a mutator phenotype, while VGIIb was 
likely introduced from Australia, and VGIIc appears to have emerged in the United States or in an undersampled locale via sex- 
ual reproduction. This work shows that multiple processes can contribute to the emergence of an outbreak. 
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Emergence and expansion of new pathogens are among the 
most important current paradigms in infectious diseases (1). 
Consequently, understanding the mechanisms by which these 
outbreaks develop is paramount. For some pathogens, emergence 
is enabled by changes in the host population, such as expansion of 
the immunocompromised population as a consequence of the 
HIV/AIDS epidemic and the rise in the number of organ trans- 
plant recipients. This has dramatically increased the number of 
annual infections caused by opportunistic pathogens like Crypto- 



coccus neoformans, particularly among HIV/ AIDS patients in sub- 
Saharan Africa (2). Likewise, the recent emergence of zygomycete 
pathogens has been driven by the expansion of a vulnerable host 
population (3). 

Alternatively, emergence can be driven by changes in the 
pathogen population, often through genomic variation. In the 
case of the yearly outbreak of seasonal influenza, antigenic drift, 
whereby viral lineages undergo positive selection altering the an- 
tigenic target of the immune system, allows the virus to evade not 
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only the immunity established in infected individuals the year 
before but also that of immunized individuals (4) . While this pro- 
cess allows the small changes required for continual, seasonal re- 
emergence, the antigenic shift that occurs by reassortment when 
two or more viruses exchange components while in the same host 
has been implicated in the emergence of large influenza outbreaks, 
including the 1918 pandemic and the 2009 HlNl epidemic (5, 6). 
These much greater genomic reassortments (here termed macro- 
evolution) can lead to emergence of pathogens with novel charac- 
teristics, including enhanced transmission and virulence. 

In eukaryotes, recombination between lineages occurs via mei- 
otic sexual cycles. This can have profound consequences for the 
development of successful pathogens. In Toxoplasma, two ancient 
lineages recombined to generate the modern hypervirulent clonal 
groups (7), and in a contemporary setting, both outcrossing and 
"selling" have been implicated in outbreaks of Toxoplasma and 
related pathogens (8). Sexual reproduction has also recently been 
shown to provide de novo short-term variation in C. neoformans 
by dramatically increasing the incidence of aneuploidy (9). Alter- 
natively, mitotic mutation can contribute to the emergence of 
pathogens. In C. neoformans, serial growth under laboratory con- 
ditions or in animals can lead to attenuation (10) or elevation of 
virulence (11), respectively. In both bacteria and fungi, mutation 
can be further enabled by development of a hypermutator state, 
often through mutations in DNA repair pathways. In yeast, loss of 
MSH2, which encodes a critical mismatch repair protein, results 
in instability in repeat tracts (12), which commonly occur in cell 
surface proteins (13), and can affect recognition by the host im- 
mune system. In C. neoformans, a hypermutator state has been 
shown to increase resistance to amoeba by enabling mutations in 
the RAM pathway and enabling pseudohyphal growth ( 14). 

Cryptococcus gattii is a basidiomycete fungus and a pathogen of 
humans. Unlike its sister species C. neoformans, two of the sub- 
types of C. gattii, VGI and VGII, predominantly infect immuno- 
competent individuals. The VGII subtype is causing an ongoing 
outbreak in the Pacific Northwest (PNW) of the United States and 
in British Columbia in Canada (15-17). This outbreak originated 
on Vancouver Island in the late 1990s and has been expanding 
geographically over the ensuing decade (15, 17). Recent human 
and veterinary cases of C. gattii show the outbreak is continuing to 
expand south and east from the Pacific Northwest (18, 19). Mul- 
tilocus sequence typing revealed that the outbreak consists of two 
subtypes of VGII, termed VGIIa and VGIIb. The VGIIb subtype is 
also found in other parts of the world, especially Australia, and has 
reduced virulence in animal models, while VGIIa is unique to the 
Pacific Northwest (with the exception of one isolate from Brazil 
that differed at one of seven multilocus sequence typing [MLST] 
markers examined) and exhibits increased virulence (20). Addi- 
tionally, these subtypes share approximately 50% of their MLST 
aEeles, suggesting that VGIIa may have beenamore virulent prog- 
eny or sibling of VGIIb (20). Subsequent sampling identified a 
third, even more virulent subtype designated VGIIc, which is, so 
far, completely unique to the Pacific Northwest outbreak in Ore- 
gon (21). Recent work based on MLST shows that the VGII lineage 
itself originated ancestrally in South America (22); however, the 
proximal source of the outbreak strains is still unknown. Australia 
(20) and South America (20, 22) have both been proposed as 
geographical sources of origin. 

Previous studies used a multilocus sequence typing (MLST) 
approach to test a maximum of 30 loci (20) . This inherently covers 



only a fraction of the true diversity that is present. Dramatic re- 
ductions in the cost of whole-genome sequencing over the past 
several years have made whole-genome sequencing of multiple 
isolates tractable. However, the few studies that have incorporated 
this approach have primarily used it in comparison to or to sup- 
plement MLST (22, 23). In this study, we utilized whole-genome 
sequencing to address the clonality of individual outbreak lin- 
eages, to examine the possibility of recombination in the popula- 
tion, and to determine the origin of the outbreak isolates. Our 
findings reveal that the clonal clusters in the Pacific Northwest 
originated through sexual reproduction within the highly sexual 
VGII population but were introduced independently of each 
other. Furthermore, the VGIIc lineage likely developed high viru- 
lence through sexual recombination of alleles, while VGIIa under- 
went mitotic microevolution, potentially driven by a mutator 
phenotype that arose following ancestral rounds of sexual repro- 
duction. 

RESULTS 

The genomes of 38 C. gattii isolates of the VGII genotype were 
sequenced to initiate an investigation of recombination and clon- 
ality. In addition, sequences for 17 previously published genomes 
from the CDC were obtained from the NCBI Sequence Read Ar- 
chive (SRA) (23). The strains analyzed are summarized in Table 1. 
Briefly, genomic sequencing targeted three sets of strains thought 
to be clonal, representing the VGIIa, VGIIb, and VGIIc groups. In 
addition, 3 VGIIa-like, 2 VGIIb-like, and 18 other VGII isolates 
not from these groups were sequenced, including 3 MATa isolates. 
In sum, 11 VGIIa, 9 VGIIb, 10 VGIIc, and 23 other genomes 
sequenced in this study or previously published were compared, 
with two isolates sequenced by both the CDC and this study for a 
total set of 53 VGII whole-genome sequences. Sequences gener- 
ated in this study are available in the SRA. 

Outbreak populations of C. gattii are highly clonaL Analysis 
of whole-genome sequencing of C. gattii isolates revealed that the 
population consists of clonal subpopulations, each of which 
shares a mating type and is differentiated within the cluster by very 
limited diversity. Five clonal clusters were identified: the previ- 
ously known VGIIa, VGIIb, and VGIIc groups; an additional clus- 
ter restricted to the Northern Territory of Australia comprised of 
isolates NT3, NT7, NT8, RDH2, and RDH7 (referred to as VGIInt 
[24]); and a MATa clonal cluster comprised of isolates LA499, 
CBS1930, and VBGcll (referred to as VGIIMATa). After elimi- 
nation of erroneous polymorphic site calls as a result of improper 
mapping of repetitive sequences, a total of 150 polymorphisms 
remained within the VGIIa group, and 288 remained within the 
VGIIb group, 130 within the VGIIc group, and 2,164 within the 
Australian VGIInt group. As most of the polymorphic alleles were 
private (restricted to a single isolate), any two given isolates within 
a group were separated by considerably fewer sites, ranging from a 
minimum of 14 to a maximum of 49 for VGIIa. VGIIb isolates 
were separated by 11 to 61 polymorphisms, excluding one outlier 
isolate B8554 (99 sites). Phylogenetic trees were constructed using 
single nucleotide polymorphisms (SNPs) derived from the whole 
genome. In the case of both VGIIa and VGIIb, comparison with 
the most closely related isolates that were still distinguishable by 
MLST analysis revealed substantial differences (Fig. lA and B). 
For VGIIa, the South American isolate ICB107 branched outside 
the VGIIa cluster from the Pacific Northwest. Similarly, the Ca- 
ribbean isolate 99/473-1 and the Florida isolate B9588 fell outside 
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TABLE 1 C. gattii genomes sequenced" 



Population and strain 



Origin 



Mating type 



Sequenced by 



VGIIa 
R265 
RBI 
RB59 
EJB17 

EJB21 (orB7467) 

B7395 

B7436 

R498 

R265 reseq 
B7422 
B8849 
B8577 



Vancouver Island, clinical 
Vancouver Island, environmental 
Vancouver Island, environmental 
Oregon, veterinary 
Oregon, clinical 
Washington, clinical 
California, veterinary 
Vancouver Island, veterinary 
Vancouver Island, clinical 
Oregon, veterinary 
Oregon, environmental 
British Columbia, environmental 



MATa 
MATa 
MATa 
MATa 
MATa 
MATa 
MATa 
MATa 
MATa 
MATa 
MATa 
MATa 



Broad Institute 
Heitman lab 
Heitman lab 
Heitman lab/CDC 
Heitman lab 
Heitman lab/CDC 
Heitman lab 
Heitman lab 
Heitman lab 
CDC 
CDC 
CDC 



VGIIa-like 
ICB107 
CBS7750 
NIH444 



Brazil, clinical 

San Francisco, environmental 
Seattle, clinical, 1975 



MATa 
MATa 
MATa 



Heitman lab 
Heitman lab 
Heitman lab 



VGIIb 
R272 
NT13 
B7394 
B7735 
B8554 
B8828 
RDH6 
V9 
V6 
V26 



Vancouver Island, clinical 

Australia, Northern Territory 

Washington, veterinary 

Oregon, clinical 

Oregon, veterinary 

Washington, veterinary 

Australia, Northern Territory, clinical 

Australia, veterinary 

Australia, veterinary 

Australia, veterinary 



MATa 
MATa 
MATa 
MATa 
MATa 
MATa 
MATa 
MATa 
MATa 
MATa 



Heitman lab 

Heitman lab 

CDC 

CDC 

CDC 

CDC 

Heitman lab 
Heitman lab 
Heitman lab 
Heitman lab 



VGIIb-like 
99/473-1 
B9588 



Caribbean Islands, clinical 
Florida, clinical 



MATa 
MATa 



Heitman lab 
Heitman lab 



VGIIc 
B8571 
B8843 
B8838 

B7466 (or EJB52) 

B7737 

B6863 

B7390 

B7432 

EIB87 

FIB 18 



Washington, clinical 
Oregon, clinical 
Washington, clinical 
Oregon, veterinary 
Oregon, clinical 
Oregon, clinical 
Idaho, clinical 
Oregon, clinical 
Oregon, veterinary 
Oregon, clinical 



MATa 
MATa 
MATa 
MATa 
MATa 
MATa 
MATa 
MATa 
MATa 
MATa 



CDC 
CDC 
CDC 
CDC 
CDC 
CDC 
CDC 
CDC 

Heitman lab 
Heitman lab 



VGIIMATa 
LA499 
CBS1930 
VBGcll 



Colombia, clinical 
Aruba, veterinary 
Puerto Rico, environmental 



MATa 
MATa 
MATa 



Heitman lab 
Heitman lab 
Heitman lab 



VGIInt 
NT3 
NT7 
NT8 
RDH2 
RDH7 



Australia, Northern Territory, clinical 
Australia, Northern Territory, clinical 
Australia, Northern Territory, clinical 
Australia, clinical 

Australia, Northern Territory, clinical 



MATa 
MATa 
MATa 
MATa 
MATa 



Heitman lab 
Heitman lab 
Heitman lab 
Heitman lab 
Heitman lab 



VGII 
78-5-46 
ICB182 
ICB183 



Utah, clinical 
Brazil, clinical 
Brazil, environmental 



MATa 
MATa 
MATa 



Heitman lab 
Heitman lab 
Heitman lab 



(Continued on following page) 
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TABLE 1 (Continued) 



Populs-tion 3nd strs-in 




A'lating type 


Secjuenced by 


ICB184 


Brazil, environmental 


MATa 


Heitman lab 


WA861 


Western Australia, veterinary 


MATa 


Heitman lab 


WM178 


Sydney, Australia, clinical 


MATa 


Heitman lab 


IP96/1 120-1 


French, clinical 


MATa 


Heitman lab 


2003.125 


French, clinical 


MATa 


Heitman lab 


93.980 


French, clinical 


MATa 


Heitman lab 


98.1132 


Caribbean, clinical 


MATa 


Heitman lab 



" The strains utilized in this study are hsted. The source of the sequence is indicated, with "Heitman lab" designating isolates newly sequenced in this study. 



the VGIIb cluster otherwise shared between Austraha and the Pa- 
cific Northwest. VGIIc was not closely related to any isolates fi-om 
any geographic regions other than the PNW but did exhibit diver- 
sity at the whole-genome level, allowing an inference of popula- 
tion structure within the VGIIc clade (Fig. IC). 

Analysis of whole-genome sequencing revealed no evidence of 
ongoing recombination within the clonal clusters in the Pacific 
Northwest via a standard allele compatibility test. In the case of 
VGIIa, only two of the sequenced strains shared any polymor- 
phisms (Fig. lA). Instead the vast majority of SNPs did not show 
evidence of either sexual recombination or shared descent past the 
introduction into the Pacific Northwest. This likely indicates that 
sampling and sequencing of isolates has not reached saturation. 
VGIIb and VGIIc showed substantially more population struc- 
ture, with a number of isolates sharing SNPs; however, both 
groups still lacked evidence for intraclade recombination via a 
standard allele compatibility test. The much higher frequency of 
shared polymorphism within the VGIIb and lie groups suggests 
that sampling has been more complete in these groups or is more 
restricted to one subclade. 

Outbreak strains have diverse proximal origins. There has 
been a debate as to the geographic origin(s) of the lineages causing 
the Pacific Northwest outbreak. The VGIIa lineage was previously 
proposed to be the offspring of a cross between VGIIb and an 
unknown parent that may have taken place either in Australia or 
in the Pacific Northwest (20). Whole-genome analysis suggests 
that VGIIa from the Pacific Northwest is a clonal group derived 
from a shared common ancestor with limited diversity (150 SNPs 
within the entire group, with as few as 14 to 49 SNPs distinguish- 
ing any two isolates) (Fig. 2A) . In addition, there is a closely related 
isolate from South America, ICB107. This isolate differs from the 
PNW VGIIa isolates by 1,174 polymorphic sites, including inser- 
tion or deletion events (indels). These sites are distributed evenly 
across the genome (Fig. 2B), with no substantial blocks devoid of 
SNPs. This suggests that these SNPs have arisen during clonal 
propagation since the last common ancestor of these isolates un- 
derwent a meiotic event and that VGIIa likely shares a common 
lineage with ICB107. While VGIIa could have entered the PNW 
through Australia as previously hypothesized, the generation of 
this lineage likely occurred much longer ago, either in South 
America or in a different location, after which it dispersed to both 
South America and the PNW. The lack of a VGIIa-like isolate 
from Australia suggests that South America is a more likely prox- 
imal source. 

Importantly, VGIIb isolates from the Pacific Northwest are 
highly similar to isolates from Australia, with some isolates clus- 
tering more closely with Australian than with other PNW VGIIb 
strains (Fig. 3). With one exception, a CDC-sequenced isolate 



from Oregon (B8554), the Australian and PNW isolates appear to 
share a single origin. In contrast, isolates described as VGIIb from 
outside the PNW or Australia (i.e., Caribbean isolate 99/473-1 and 
a recent Florida clinical isolate, B9588) group outside this clonal 
lineage. It may be more accurate to describe these isolates as 
VGIIb-like, as they do not appear to fit within the clonal outbreak 
cluster encompassing both the PNW and Australian isolates and 
were likely introduced to the southeast United States in a separate, 
unrelated event. This evidence suggests that the VGIIb/minor 
component of the outbreak either arrived in the PNW from Aus- 
tralia or arrived in both the PNW and Australia at very similar 
times from a common progenitor population. 

VGIIa-like isolates have a frameshift mutation in a key DNA 
repair gene. The VGIIa population consists of two distinct 
groups: the outbreak strains in the Pacific Northwest and a less 
virulent group consisting of the strains ICB107, CBS7750, and 
NIH444 that we call VGIIa-like (20, 21). These strains were all 
isolated prior to the onset of the Pacific Northwest outbreak, and 
all are characterized by diminished virulence. A rough estimate 
based on the commonly accepted estimate of 0.0081 substitution 
per site per million years (25) suggests that these groups diverged 
91.7 years ago, although with such a small number of SNPs, this 
number could vary dramatically. As our results show that VGIIa 
and the VGIIa-like isolates do not appear to be separated by a 
recent meiotic event (Fig. 2B), we explored mitotic microevolu- 
tion as a potential cause of the increased virulence observed in the 
more recently isolated VGIIa strains. SNPs and indels differenti- 
ating the VGIIa and VGIIa-like strains were examined for poten- 
tial impact using SNPeff (25) (see Table SI in the supplemental 
material). Twelve missense mutations were identified, but only 
one frameshift mutation was found (Fig. 4A). This mutant was in 
the 4th of 19 exons of CNBG_1161, leaving only 42 amino acids 
unchanged out of the 964-amino-acid length. CNBG_1161 en- 
codes the C. gattii ortholog of Msh2, a MutS-like DNA mismatch 
repair protein characterized in Saccharomyces cerevisiae (12, 26) 
(Fig. 4B). The msh2 frameshift mutation is present in all three 
VGIIa-like isolates (ICB107, CBS7750, and NIH444), but the 
wild-type allele is present in all VGIIa outbreak isolates analyzed. 

Recombination is occurring between clonal clusters. While 
strains within each of the five clonal clusters of C. gattii are highly 
related to each other, comparisons among clusters revealed sub- 
stantial differences. However, the sites differentiating these clus- 
ters were not uniformly dense across the genome. This is demon- 
strated by incongruity between phylogenies inferred from 
different portions of the genome (Fig. 5), suggesting that ancestral 
recombination among the clonal lineages has resulted in some 
portions of the genome being more similar between two clonal 
groups than the others. Further evidence is provided by examin- 
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FIG 1 Phylogenetic reconstruction shows evidence for cfonal expansion in outbreak groups. Maximum likeliliood pliylogenies were constructed using a matrix 
of^ all available SNPs. Bootstrap values are based on 500 replicates. The scale bar indicates the number of substitutions per nucleotide position. (A) Phylogeny 
based on SNPs unique to VGIIa. ICB107 was also included as it differed at only one MLST marker in previous studies. (B) Phylogeny based on SNPs unique to 
VGIIb. 99/473-1 was also included as it was simUar to VGIIb in previous studies. (C) Phylogeny based on SNPs unique to VGIIc. 
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FIG 2 VGIIa outbreak isolates are more closely related to each other than to historical isolates or South American VGII isolates. (A) Prediction of VGIIa 
population structure based on haplotype network inference. Only SNPs were used to eliminate alignment artifacts. All Pacific Northwest isolates are predicted 
to share a relatively recent last common ancestor. The South American isolate ICB 1 07, while the closest among isolates included in this analysis, is more distantly 
related. (B) SNPs distinguishing ICB107 fi'om the VGIIa isolate R265 plotted across the genome. SNPs are distributed across the genome, suggesting ICB107 was 
not separated from VGIIa by a recent meiotic event. 



ing SNP density between isolates. This analysis demonstrates the 
presence of islands of identity between any two individual isolates 
where diversity is considerably lower than the genome-wide aver- 
age (Fig. 6A). Inferring phylogenies from these regions of de- 
pressed polymorphism reveals uniquely close relationships 
among strains (Fig. 6B). This suggests that sexual reproduction 
likely contributed to the production of the ancestor of each clonal 
cluster, with subsequent mitotic clonal expansion of these isolates 
to spread throughout the environment. Notably, none of the out- 
break clusters has an individual partner strain or cluster that con- 
tributes a large portion of the genome (i.e., 50% for one cross, 25% 
for two crosses, etc.). This suggests that either sampling has cap- 
tured only a small portion of the VGII population or some of these 
intermediate strains may no longer exist. 



Diploid RB59 isolate is homozygous. A diploid VGIIa iso- 
late RB59 was previously identified by fluorescence-activated 
cell sorter (FAGS) analysis (20). Using MLST markers, this 
strain did not appear to be a hybrid of two different VGII 
lineages. Whole-genome sequencing further revealed that this 
strain is homozygous for all identified SNPs and indels. This 
suggests that RB59 is the result of either endoreplication or 
mother-daughter cell-cell fusion, both of which are possible 
intermediates in the unisexual cycle. This would not produce 
offspring with substantial polymorphism, as both parental 
chromosomes would be identical in both copies, but it could 
facilitate the production of basidiospores, which might then 
serve as infectious propagules, or the production of aneuploid 
variant progeny (9, 27). 
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FIG 3 VGIIb outbreak isolates cluster more closely with Australian isolates than Caribbean and Florida isolates. Shown is a prediction of VGIIb population 
structure based on haplotype network inference. Pacific Northwest VGIIb isolates and Australian isolates cluster together, with the exception of the Pacific 
Northwest isolate B8554 sequenced by the CDC. 
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Analysis of possible recombination/gene conversion in the 
MATa locus. Recombination can occur via either bisexual or uni- 
sexual reproduction. However, the mating type locus is an excep- 
tion, as this ~100-kb region is highly rearranged and divergent 
between MATa and MATa isolates and recombination between a 
and a alleles will result in acentric or dicentric chromosomes. As a 
result, any evidence of crossover within this locus should be the 
result of mating between two isolates of the same mating type. 
This locus has a relatively robust population structure, with the 
VGIIa and VGIIb clonal groups being closely related compared to 
the other clonal clusters (see Fig. SIA in the supplemental mate- 
rial). 



We examined 995 SNPs within the MAT locus. Three SNPs 
broke apart otherwise conserved haplotypes and were suggestive 
of recombination via a standard allele compatibility test. Two of 
these lie within the coding sequence of the gene encoding Ste20 
(see Fig. SIB), while the other was approximately 40 kb away in the 
intergenic region between the SP014 and GEFl genes (not 
shown). All three SNPs were well supported, with high-quality 
scores, high mapping scores, and high depth of coverage. The 
intergenic variant between SP014 and GEFl lies in a region that is 
colinear between the a and a alleles and so could potentially stiU be 
the result of recombination during bisexual mating. This variant 
also lies close to a known a-a mating gene conversion hot spot in 
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C. neoformans (2 eventsout of 255 progeny tested) (28). However, 
the other pair hes within a gene that is diverged and rearranged 
between a and a alleles. Given the lack of flanking recombination 
tracts matching the new allele combinations, the most parsimoni- 
ous explanation is that one of the alleles underwent gene conver- 
sion during a-a unisexual mating. Alternatively, it is possible that 
homoplasic mutations may have contributed to the signature of 
gene conversion. We can estimate the frequency of homoplasic 
mutation by measuring the frequency of multiallelic SNPs com- 



pared with biallelic SNPs in the SNP calling set. There are 1,346 
multiallelic SNPs out of 223,743 total SNPs for a total frequency of 
0.60%. Mutations at a biallelic site should create homoplasic mu- 
tations half as often as those at a multiallelic site, so with 995 SNPs 
in the MATlocus, this suggests approximately 3 homoplasic SNPs 
may exist in this set. As a result, we cannot confidently distinguish 
the two possible explanations for the origin of these SNPs. 

The VGII group is recombining at the population level. In 
addition to the shared genomic islands, use of a paired allele com- 
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patibility test based on SNP information (AB X ab giving rise to 
AB, ab, Ab, or aB) provided evidence for sexual reproduction and 
recombination (Fig. 7A). One hundred random loci were selected 
using the Genome Analysis Toolkit (GATK) SelectVariants walker 
out of the 262,614 polymorphic sites in the data set. These loci 
were collapsed into unique and informative allele combinations, 
leaving 46 loci, and then compared. This test showed evidence for 
mating and meiosis occurring between a number of different iso- 
lates at the whole-genome level, both with and without the in- 
volvement of the three sequenced MATz. isolates in this study. In 
particular, all of the 46 loci tested showed evidence for recombi- 
nation with at least 13 of the 45 other loci (29%) and as many as 40 
out of 45 (89%) (Fig. 7B). Recombination provides strong evi- 
dence for the involvement of the sexual cycle in the production of 
these C. gattii strains, and given the paucity of a isolates in the 
environment, we hypothesize this involved both a-a bisexual and 
QL-cL unisexual reproduction. 

Genomic islands of high polymorphism in the VGII group. 
Polymorphisms among VGII isolates are not homogeneous along 



the chromosomes. In contrast with the islands of diminished poly- 
morphism, we found SNP density was elevated locally, resulting in 
genomic islands of high polymorphism. For example, a region on 
R265 supercontig 13 located between 393 and 417 kb showed 3X 
to 4X more polymorphisms than the surrounding regions 
(Fig. 8B). This region coincides with two blocks of high linkage 
disequilibrium on supercontig 13 (Fig. 8A). We examined 
whether the increased polymorphisms were caused by the inclu- 
sion of particular VGII isolates and found that the high polymor- 
phisms between positions 393 and 408 kb are caused by two out- 
lying clonally related isolates (2001/935-1 and IP96/1120-1) 
(Fig. 8D). Removal of these two isolates reduced the polymor- 
phism to levels identical to the genomic surroundings (red area in 
Fig. 8C). Similarly, the high level of polymorphism in the region 
from 414 to 417 kb is caused by the presence of six clonal isolates 
from the VGIInt group (NT3, NT7, NT8, RDH2, RDH7, and 
MMRL2647) (Fig. 8D). Removal of these six isolates from the 
analysis drastically reduced the level of polymorphism in this re- 
gion (blue area in Fig. 8C). 
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Incongruent phylogeny on supercontig 13 indicates intro- 
gression into VGII isolates. To understand the causes of the re- 
gions of elevated polymorphism, we constructed a phylogeny 
based on sequence polymorphisms within the region of 393 to 
408 kb on supercontig 13. We included diverse C. gattii isolates of 
aU four VG types. The phylogeny resolved all four VG types of 
C. gattii as expected (Fig. 9B). However, isolates 2001/935-1 and 
IP96/1120-1 were placed incongruously next to VGI and VGIV 
instead of being clustered among other VGII isolates. Clade sup- 
port for these groupings was high (95% and 73% bootstrap sup- 
port, respectively). The incongruous groupings are indicative of 
an introgression event into isolates 2001/935-1 and IP96/1120-1 
from VGI C. gattii. In contrast, in the second block from 414 to 
417 kb, isolates from the VGIInt group were placed within VGII 
(data not shown). 

The affected region from 393 to 408 kb comprises three genes: 
CNBG_4871, CNBG_4872, and CNBG_4873 (Fig. 9A). All of 
these genes showed considerable amino acid polymorphism 
among C. gattii isolates (see Fig. S2 in the supplemental material). 



Isolates 2001/935-1 and IP96/1120-1 share amino acid polymor- 
phism with different VG groups at different positions in the pro- 
tein, suggesting that the introgressed region was affected by re- 
combination or the donor is an unknown VG type. In 
CNBG_4873, the two isolates also have an amino acid at position 
44 that was not found in any other C. gattii isolate. The three genes 
in the affected region from 393 to 408 kb are not yet functionally 
characterized in C. gattii. A homology search based on a translated 
nucleotide sequences identified CNBG_4871 as being a likely or- 
tholog of AV03 in Saccharomyces cerevisiae. Avo3 is part of the 
TORC2 complex that contains the Tor2 protein and is involved in 
polarization of the actin cytoskeleton. CNBG_4872 is predicted to 
encode the ATP-dependent DNA helicase Mphl that dissociates 
Rad51 D-loops. CNBG_4873 is predicted to encode the metal ho- 
meostasis protein Bsd2 with roles in metal ion transport and vac- 
uolar protein targeting. 

Introgressed isolate from an HIV""" patient is aneuploid. Both 
isolates 2001/935-1 and IP96/1120-1 do not belong to an outbreak 
VGII subtype but do group within the VGII clade (Fig. 8D). Both 
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isolates are of clinical origin and were isolated from HIV-positive 
patients from Africa. Isolate IP96/1120-1 was isolated in France 
from a patient who moved from Africa and isolate 200 1/935- 1 was 
isolated from a bronchoalveolar lavage sample from a Senegalese 
patient (20). Despite the near clonal nature of the two isolates 
2001/935-1 and IP96/1120-1, we found that IP96/1120-1 likely 
had multiple partial and complete chromosomal duplications not 



found in 2001/935-1. Read depth on supercontigs 6 and 13 indi- 
cated partially duplicated regions and read depth on supercontig 
14 indicated duplication of the entire region covered by this su- 
percontig (Fig. 10). Interestingly, the introgressed region on su- 
percontig 13 is found within a duplicated region of this supercon- 
tig. All three duplications contain only homozygous variants, 
suggesting the extra copies arose by duplicating the original chro- 
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mosomal region represented by the supercontig, rather than 
through mating with a different strain, although a-a mating or 
autodiploidization followed by sequential chromosome loss can- 
not be ruled out. Duplication of chromosomes readily occurs in 
Cryptococcus following exposure to antifungal compounds and 
during sexual reproduction (9, 29). Chromosomal disomy in 
IP96/1120-1 isolated from an HIV"*" patient may have been a di- 
rect consequence of antifungal treatment in the patient. 

In addition to the aneuploidy observed in IP96/1120-1, two 
additional isolates were found to be aneuploid (see Fig. S3 in the 
supplemental material). The VGIIb isolate B8828 is disomic 
across supercontig 13, while the French clinical isolate 93.980 is 
disomic across supercontig 14 and parts of the region aligning to 
supercontig 6 occur in 1, 2, 3, 4, 5, or 6 copies (see Fig. S4 in the 
supplemental material). These isolates are of clinical and veteri- 
nary origin, respectively, which may indicate that aneuploidy 
could be an adaptation to growth in an animal host. 

DISCUSSION 

Understanding the evolution of virulence in pathogens is key to 
understanding the development of emerging diseases. The sudden 
emergence of an outbreak of the previously primarily tropical 
pathogen C. gattii in the temperate Pacific Northwest in the late 
1990s raised a number of questions about the emergence of this 
outbreak, namely, the genetic and geographic origin of this patho- 
gen. 

Adaptation and emergence of virulence can take place through 
both small changes introduced through random mutation, as well 
as through reassortment of preexisting mutations in the popula- 
tion. The VGII C. gattii population shows examples of both types 
of adaptation. The VGIIa lineage is a result of small-scale mitotic 
adaptation from a shared last common ancestor with the strains 



we have termed VGIIa-like. These lineages 
display different virulence potentials, and 
the VGIIa outbreak strains have increased 
virulence relative to the VGIIa-like isolates 
ICB107, CBS7750, and NIH444 (20, 21). 
Both CBS7750 and NIH444 are closely re- 
lated to the VGIIa clonal outbreak cluster, 
and thus comparison of the genomes may 
allow identification of the critical changes 
impacting virulence that enabled the emer- 
gence of the VGIIa subclade of the Pacific 
Northwest outbreak. In fact, the first 
VGIIa-like isolate, NIH444, was isolated in 
Seattle in the 1970s. This raises the obvious 
question of why, if VGIIa has been present 
in the Pacific Northwest for so long, did the 
outbreak not begin until the late 1990s? 
Our whole-genome sequencing analysis 
suggests that microevolution may have led 
to heightened virulence and expansion of 
these isolates as an outbreak and likely oc- 
curred over a relatively short time (from 
-1970 to 1999) in the Pacific Northwest, 
although the ancestral origin of these iso- 
lates may lie in South America, along with 
that of the VGIIa-like isolate ICB107. 
There is precedent for dramatic changes in 
virulence through mitotic growth in the se- 
rially passaged H99 laboratory isolates (10). An alternative hy- 
pothesis is that both VGIIa and VGIIa-like strains arose some- 
where else and were both introduced to the Pacific Northwest, 
potentially at different times. We favor the explanation that VGIIa 
arose from the VGIIa-like lineage via mitotic mutation. The recent 
development of a congenic VGIIa mating pair (30) wiU allow ex- 
perimental crosses between R265a and the VGIIa-like isolates to 
test these and other hypotheses. 

Furthermore, our results suggest that the microevolution of 
VGIIa may have been mediated by a hypermutator phenotype as a 
result of an early frameshift mutation in the ortholog of MSH2, a 
key mismatch repair component. This may be a transient muta- 
tion that was repaired in the VGIIa lineage, after allowing short- 
term adaptation for higher virulence. In C. neoformans, there is 
evidence that hypermutators aid in adaptation to stress caused by 
amoeba (14); however, in this case the hypermutator strains 
themselves have diminished virulence in mice. This is counter to 
experiments done with signature-tagged mutants in C. neofor- 
mans, where both the msh2\ and mlhlA mismatch repair mutants 
appear to proliferate faster in mouse lungs in a signature-tagged 
mutant experiment (31). Further experiments will be necessary to 
test whether Msh2 contributes to virulence in VGII C. gattii. In 
this case, we suspect the main contribution of the mutation in 
MSH2 is not a direct effect on virulence, but instead it may have 
potentiated changes in the progenitor of the VGIIa group, result- 
ing in a more fit lineage that gave rise to the VGIIa outbreak 
strains. The msh2 frameshift mutation is only present in the 
VGIIa-like isolates that we hypothesize are the progenitors of the 
VGIIa outbreak lineage in which the MSH2 gene has been restored 
to wild type. Further experiments will be necessary to test these 
hypotheses. 

In contrast, the novel VGIIc type is, as a whole, restricted to the 
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Pacific Northwest, possibly only to the state of Oregon, and has 
dramatically enhanced virulence (21). This clonal lineage, unlike 
VGIIa, appears to have been generated as a new, highly virulent 
subtype via mating. The virulence phenotype of VGIIc exceeds 
that of any other VGII subtype tested (21), and it appears to be a 
clonal expansion of a single original strain in the PNW. An alter- 
native is that VGIIc exists but has not been sampled in another 
geographic region. Both models are possible, but we favor a recent 
origin for two reasons: (i) the high virulence potential of VGIIc 
suggests that we would likely observe human or veterinary cases if 
it were present in another highly populated locale, and (ii) ongo- 
ing sampling since the start of the PNW outbreak has been unable 
to identify a single VGIIc, or even VGIIc-like, isolate outside the 
PNW. Additional sampling may reveal other VGIIc isolates in 
other parts of the world, but this lineage appears to be the result of 
a meiotic event resulting in a more fit progeny. 

The proximal geographic source of the VGII outbreak has been 
debated. Hypotheses have included an origin in Australia (20) as 
well as in South America (20, 22). Data from whole-genome anal- 
ysis suggest that both hypotheses may be at least partially correct. 
In fact, it appears that VGIIa arose in its most recent, virulent form 
in the Pacific Northwest, although it likely originated ancestrally 
in South America, and VGIIb came to the Pacific Northwest by 
way of Australia, although VGIIb-like isolates exist in the Carib- 
bean and now in Florida (32) within the United States. In contrast, 
VGIIc arose in either the Pacific Northwest or some unsampled 
location elsewhere in the world. An origin other than the PNW 
seems less likely given the high virulence potential of VGIIc, unless 
it lies in a pristine niche to which the human population and other 
animals have not yet been exposed. Notably, whole-genome data 
suggest that each of the molecular types of the PNW population 
may have been introduced independently and possibly at different 
times. Sequencing of additional South American VGIIa-like iso- 
lates, as well as additional global VGIIb-like isolates, will help 
increase confidence in these hypotheses. 

As a whole, the VGII group is robustly recombining at the 
population level based on a standard allele compatibility test. 
Within our set of samples, multiple loci from the randomly se- 
lected set show evidence for sexual recombination that rely on 
involvement of the clonal outbreak groups from the Pacific 
Northwest to complete the allele compatibility test. We cannot be 
certain whether these groups were the parents in such a putative 
cross or the offspring. Likewise, we cannot be certain that addi- 
tional sampling will not reveal additional strains that may have 
these allele pairs. However, in combination with the presence of 
islands of identity within the genome, this provides strong evi- 
dence that the clonal outbreak lineages are the result of sexual 
reproduction, at least part of which resulted in shared genomic 
portions between these clonal outbreak groups and their ancestral 
progenitors. 

Additionally, while we cannot rule out a-a opposite sex mat- 
ing, it is likely that a-a unisexual mating is contributing to recom- 
bination within the VGII group globally. Outside of South Amer- 
ica, there arze very few MATa isolates, and there are multiple 
genotypes that have not been isolated from South America, de- 
spite relatively extensive sampling. In this study, we sequenced 
three MATa isolates representing two MLST types. These isolates 
were more similar than predicted by the MLST approach and 
likely represent a single progeny from a cross that has mitoticaUy 
expanded over time, analogous to the VGIIa group. Notably, these 



isolates have previously been shown to be fertile in an a-a labora- 
tory cross (21), but recombination appears to be rare in the envi- 
ronment. 

The population genomic and phylogenetic analyses of super- 
contig 13 strongly suggest that the genomes of isolates 2001/935-1 
and IP96/1 120-1 contain introgressed sequences. A very recent 
introgression event is expected to leave a mosaic structure of mul- 
tiple introgressed sequences in the genome, although introgres- 
sion events across a species boundary can result in single intro- 
gressed tracts. The highly localized nature of the discordant 
phylogenetic signals and evidence for recombination among hap- 
lotypes in the introgressed tract suggests that the introgression 
event did not occur in very recent generations, or occurred across 
a species boundary. The source of introgression is most likely an 
isolate from the VGI group. Introgression from related VG groups 
into VGII is likely as at least VGII and VGIII can produce viable 
progeny and transmit hypervirulence traits (33). The strong link- 
age disequilibrium and restricted introgression tract suggest that 
selection favored the fixation of the introgressed sequences (34). 

Genomes from the VGII C. gattii isolates suggest that it is a 
population heavily characterized by ancestral sexual recombina- 
tion, including evidence for introgression from other VG types, 
but also by more recent clonal expansions. We found that neither 
of the two original components of the VGII outbreak in the PNW 
appears to have been the immediate product of sexual reproduc- 
tion. Instead, VGIIa may have arisen in South America and un- 
dergone expansion, driven by a transient mutator, to become 
more virulent in the PNW, and VGIIb appears to have arrived in 
the PNW largely unchanged from Australia. In contrast, VGIIc, as 
a result of its uniqueness to the PNW and its novel high virulence, 
may have arisen through a recent mating in the Pacific Northwest. 
This provides evidence that mitosis -driven mutations, migration, 
and sexual reproduction can all contribute to the development of 
an outbreak. 

MATERIALS AND METHODS 

Genome sequencing. DNA was prepared using the cetyltrimethylammo- 
nium bromide (CTAB) isolation method as previously described (28). 
Library construction and genome sequencing were performed at the Uni- 
versity of North Carolina Next Generation Sequencing Facility using a 
Hiseq2500. Isolates were multiplexed and run with 16 samples in each 
lane. Paired-end sequence libraries with approximately 300-base inserts 
were constructed and sequenced to a length of 100 bp per end. The lUu- 
mina Pipeline (v.1.8.2) was employed for initial processing. 

Reference-based assembly. Sequences were mapped to the R265 ref- 
erence genome (35) using the short read component of the BWA aligner 
(36). Further refinement was performed using the Genome Analysis Tool- 
kit (GATK) (37) pipeline version 2.4-9, including SAMtools (38) and 
Picard to convert from sam to bam format and to clean, sort, fix read 
groups, and mark duplicates, respectively. Indels were also realigned using 
GATK's IndelRealigner. SNPs and indels were called with the GATK Uni- 
fied Genotyper, using the haploid ploidy setting. SNP filtering was per- 
formed using VCFtooIs (39) to remove all SNP calls with missing data in 
at least one sample, a call quality below a PHRED score of 30, or a mean 
depth below 10. This eliminated about 7% of the initial calls, almost en- 
tirely as a result of the missing data parameter. This data set was analyzed 
using SnpEff (46). For some analysis, including the MAT locus, the orig- 
inal unfiltered set was used because MATa isolates contributed missing 
sites across much of the MAT locus. In addition, to examine recombina- 
tion within the clonal groups, SNPS were manually filtered by visually 
examining the BAM files using IGV for clear evidence of misalignment of 
reads to eliminate obviously erroneous calls. For the introgression analy- 
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ses, additional outgroup isolate data sets from VGI, VGII, VGIII, and 
VGIV were downloaded from the Short Read Archive and analyzed as 
described above (see Table S2 in the supplemental material). 

Inference of phylogeny. Maximum likelihood phylogenies were con- 
structed using MEGA5 (40) either after alignment using Kalign (41) on 
artificial fasta files using GATK's FastaAlternateReferenceMaker or from 
SNP matrices extracted from the VCF format using a custom Perl script, 
included in Text SI in the supplemental material. Bootstrap analysis was 
based on a total of 500 replicates. MEGA trees were midpoint rooted. 
Support was calculated using 500 bootstraps. Phylogenetic analyses of 
introgression tracts were performed using RAxML 8.0.20 (42) using the 
GTRCAT nucleotide substitution model and rapid bootstrap analysis for 
500 replicates. 

Haplotype network inference Haplotype networks were constructed 
using TCS 1.21 (43) from SNP matrices within the clonal lineages. This 
relies on the assumption that the strains were not separated by meiotic 
events, which we found evidence for from paired allele comparisons, and 
essentially treats the entire genome of each clonal group as a single hap- 
lotype. Indels were not used for this analysis to eliminate alignment errors. 

Linkage disequilibrium analyses. Chromosome-wide linkage dis- 
equilibria {R^) were calculated for all SNP markers with a minor allele 
frequency of at least 0.03 using VCFtools version 0.0.12a (39) and visual- 
ized with the R statistical software (44). Script has been included in 
Text S 1 in the supplemental material. 

Whole-genome allele compatibility tests. One hundred random 
SNPs out of the 262,614 SNPs in the call set were selected using the ran- 
dom selection component of GATK's SelectVariants walker. These SNPs 
were then collapsed into 46 unique allele patterns. Each pattern was com- 
pared pairwise with each other pattern to test for the presence of all four 
allele combinations. Formally, this is a test that all four products of mei- 
osis are present: AB, Ab, aB, and ab. This analysis assumes that homoplasy 
is rare, an assumption that is supported by the low frequency of multial- 
lelic SNPs in the data set, which should be more frequent than homoplasic 
mutations. 

Nucleotide sequence accession number. Sequences have been de- 
posited into the Sequence Read Archive under project accession no. 
SRP044232. 
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